Context

The US airline carrier 'Falcon Airlines' is facing a decrease in sales over the years, while the airline industry demand is positive growing. therefore, it was imperative that marketing department conducted a survey among 90917 individuals who travelled using the service of the airline, to determine the level of satisfaction based in the service provided, facilities and technology to deliver a better, safe and pleasant experience to the customer.

Hence, the company had established certain parameters which had been considered to play in the important role to understand the consumer demands now a days for better service and be able to identify ways to improve and innovate.

Objective

Dictionary

The project has 2 sources of data information, the flight data has information related to the passangers and the performance of the flights in which they travelled and the survey data is the information collected post service experience

Flight data

  1. ID: Number of customer
  2. Gender: Gender of the passengers (Female, Male).
  3. Customer Type: The customer type (Loyal Customer, Disloyal Customer)
  4. Age: The actual Age of the passengers
  5. Type Travel: Purpose of the passengers'’ flight (Personal or Business travel)
  6. Class: travel class in the flight (Business, Economy, Eco Plus)
  7. Flight Distance: The flight distance
  8. Departure Delay in Mins: Minutes delayed when departure
  9. Arrival Delay in Mins: Minutes delayed when arrival

Survey data

  1. Satisfaction: Airline satisfaction level (Satisfaction, neutral or dissatisfaction)
  2. Seat comfort: Satisfaction level of the seat comfort
  3. Departure Arrival Time convenient
  4. Food Drink: Satisfaction level of the food and drink
  5. Gate Location: Satisfaction level of the gate location
  6. Inflight Wi-Fi service: Satisfaction level of the Wi-Fi service.
  7. Inflight entertainment: Satisfaction level of the Inflight entertainment
  8. Online Support: Satisfaction level of the online support.
  9. Ease of Online booking: Satisfaction level of the Ease of Online booking
  10. Onboard service: Satisfaction level of the Onboard service
  11. Leg room service: Satisfaction level of the leg room service
  12. Baggage handling: Satisfaction level of the baggage handling
  13. Check in service: Satisfaction level of the checking service
  14. Cleanliness: Satisfaction level of the cleanliness
  15. Online Boarding: Satisfaction level of the online boarding

Importing Libraries

Data Pre-processing

The survey parameteres and binary target needs to be encode for easier syntax.

Columns Processing

The data columns will be splitted according to identify variables that are monitored by the company, and which are controlled by the business. so the parameters will be splitted in 2.

Variables that wants to be controlled to retain customers:

And Variables controlled by the business which have an impact in to the company profits:

Facilities Average Stadistics Summary

Online Service Average Stadistics Summary

InFlight Average Stadistics Summary

EDA

Univariate Analysis

Observations from Age

Observations from Flight Distance

Observations from Departure Delay in minutes

Observations from Arrival Delay in minutes

EDA for Categorical Variables

Observation from Gender

Observation from Customer Type

Observation from Type Travel

Observations from Class

Observations from Satisfaction

Observations from Seat Comfort

Observations from Departure Arrival Time Convinient

Observations from Food drink

Observations from Gate Location

Observations from In Flight wifi service

Observations from Inflight entertaiment

Observations from Online Support

Observations from Ease Online booking

Observation from Onboard service

Observations from Leg room services

Observations from baggage handling

Observations from Check in Service

Observations from Cleanliness

Observations from Online Boarding

Observations from Facilities Average

Observations from Online Service

Observations from Inflight Services

Bivariate Analysis

Corralation Matrix

Observations

Observations

Bivariate Analysis for categorical variables

Satisfaction vs Gender

Satisfaction vs Customer Type

Let's check why Loyals and disloyal customers are not satisfied

Satisfaction vs Type Travel

Let's check why customers are neutral or insatisfied with the airline service by Type Travel.

Satisfaction vs Class

Satisfaccion vs Seat Comfort

Satisfaction vs Departure Arrival Time convenient

Satisfaction vs Food Drink

Satisfaction vs Gate location

Satisfaction vs In flight wifi

Satisfaction vs Inflight Entertainment

Satisfaction vs Online Support

Satisfaction vs Ease Online booking

Satisfaction vs Onboard Service

Satisfaction vs Leg room service

Satisfaction vs Baggage Handling

Satisfaction vs Check in service

Satisfaction vs Cleanliness

Online Boarding vs Satisfaction

Survey Variables Summary

Facilities Average vs Satisfaction

Online Service Average vs Satisfaction

Inflight Service Average vs Satisfaction

Analysis of Bivariate numerical variables

Satisfaction vs Age

Histogram more in detail

Let's check what make them satisfied or dissatisfied

Satisfaction vs Flight distance

Satisfaction vs Departure Delay in Mins

Satisfaction vs Arrival Delay in Mins¶

Multivariate Analysis

Customers Information

Flight distance vs Age vs Satisfaction

Departure Delay in Mins vs Age vs Satisfaction

Arrival Delay in Mins vs Age vs Satisfaction

Type Travel vs Age vs Gender vs Satisfaction

Customer Type vs Age vs Gender vs Satisfaction

Customer Type vs Flight Distance vs Gender vs Satisfaction

Type Travel vs Class vs Age vs Satisfaction

Customer Type vs Class vs Age vs Satisfaction

Flight distance vs Type of Travel vs Class vs Satisfaction

Flight distance vs Customer Type vs Class vs Satisfaction¶

Departure Delay in Mins vs Type of Travel vs Class vs Satisfaction

Arrival Delay in Mins vs Type of Travel vs Class vs Satisfaction

Age vs Facilities vs Satisfaction

Gate Location vs Age vs Satisfaction

Onboard Service vs Age vs Satisfaction

Baggage Handling vs Age vs Satisfaction

Checking Service vs Age vs Satisfaction

Facilities Average vs Age vs Satisfaction

Age vs Online Service vs Satisfaction

Online Support vs Age vs Satisfaction

Ease of Online Booking vs Age vs Satisfaction

Online boarding vs Age vs Satisfaction

Online Service Average vs Age vs Satisfaction

Age vs In-Flight vs Satisfaction

Seat comfort vs Age vs Satisfaction

Departure Arrival Time convenien vs Age vs Satisfaction

Food Drink vs Age vs Satisfaction

Inflight Wi-Fi service vs Age vs Satisfaction

Inflight Entertainment vs Age vs Satisfaction

Leg Room vs Age vs Satisfaction

Cleanliness vs Age vs Satisfaction

Inflight Average vs Age vs Satisfaction

Inflight Entertainment vs Flight Distance vs Type Travel vs Satisfaction

Inflight Entertainment vs Age vs Gender vs Satisfaction

Conclusions

Let's find the percentage of outliers, in each column of the data, using IQR.

Data Preparation for Modeling

Missing-Value Treatment

Imputing missing values

Feature Scaling

I need to bring all of the features of a Machine Learning problem to a similar scale or range. Feature scaling can have a significant effect on a Machine Learning model’s training efficiency and can improve the time taken to train a model.

Feature selection thru Chi-Square method

Feature selection thru Lasso

Dropping the least important features for Analysis

Splitting data

Encoding Categorical

Building Models

Model evaluation criterion

Model can make wrong predictions as:

  1. Predicting a customer will be satisfied with the airline and the customer doesn't
  2. Predicting a customer will be neutral or dissatisfied and the customer is satisfied

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

First, let's create two functions to calculate different metrics and confusion matrix, so that we don't have to use the same code repeatedly for each model..

Model Building

Performance comparison

Hyperparameter Tuning

Model Performance Comparison

Train Performance Comparison

Validation Performance Comparison

Test for Random Forest Randomized